PTMs and half-lives
The half-life of a protein is the time it takes for the concentration of a protein to decrease by a half. Protein half-lives can be used as estimates of residence time of proteins in the cell.
Proteins that reside longer in the cell may be more susceptible to oxidative damage.
It is assumed that each protein only has one modification. The proteins with no modifications are not identified.
Protein half-lives for short lived proteins can be found here: Proteome-wide mapping of short-lived proteins in human cells - ScienceDirect (e-bronnen.be)
Protein half-lives of long-lived proteins can be found here: Systematic analysis of protein turnover in primary cells | Nature Communications
Check that phosphorylation is the most abundant (literature).
Proteins with a short half-life
Proteins can have varying half-lives
What do the half-lives depend on?
How are they measured?
Below is a comparison of the distribution of the half-lives that was found in literature and the distribution of a subset of those half-lives in the proteins found in the dataset.
This is to check the proteins present in a particular half-life interval.
These are the modifications for a particular protein.
I want to remove the proteins with a very high number of log10(counts_norm_abund_len).
Outliers
Detecting the outliers:
Check that the outliers are removed:
What is the resulting distribution.
SRRM2_HUMAN
2752 amino acids.
There is a huge peak if you look at the data that is only normalised by the number of raw files.
Looking at SRRM2_HUMAN in more detail.
What are the most common modifications in this protein?
PTMs
Using genes from GenAge is ligit. Can continue doing that.
PTMs of interest:
PTMs that control autophagy
phosphorylation
ubiquitination -> need to use the new dataset
acetylation
oxPTMs
- you have a list of these
Methylation eg of histones
Acylation -> need to get this from the paper.
Phosphorylation
This is already without outliers
- Only the modification [21]Phospho is present here.
Splitting the dataset in a group with phosphorylation proteins and another group with all remaining proteins.
It is not necessary to include another density line with all of the proteins. You can just compare the two distributions.
Comparison
Testing whether the half-lives between groups are significantly different. Wilcoxon test (note that the sample sizes are uneven). The p value was adjusted uisng the formula p/sqrt(N/100), where N = n1+ n2.
Design-based KruskalWallis test
data: mean_hl_hours ~ mod_group
t = 2.4341, df = 1164, p-value = 0.01508
alternative hypothesis: true difference in mean rank score is not equal to 0
sample estimates:
difference in mean rank score
0.09386352
[1] "P-value cut-off"
[1] 0.01464269
Enriched proteins in both datasets:
Proteins that are only present in one of the dataframes.
Acetylation
- Filtered by the [1]Acetyl modification.
Design-based KruskalWallis test
data: mean_hl_hours ~ mod_group
t = 3.5568, df = 1164, p-value = 0.0003905
alternative hypothesis: true difference in mean rank score is not equal to 0
sample estimates:
difference in mean rank score
0.1638661
[1] "P-value cut-off"
[1] 0.01464269
Ubiquitination
Ubiquitination has the classification ‘Other’. Take that as one group. The second group is all of the PTMs. 890 proteins overlap so you have 289 proteins taht are not ubiquitinated and have PTMs and we know their half-lives. These make up the second group.
Design-based KruskalWallis test
data: mean_hl_hours ~ mod_group
t = -2.3096, df = 1174, p-value = 0.02108
alternative hypothesis: true difference in mean rank score is not equal to 0
sample estimates:
difference in mean rank score
-0.09965387
[1] "P-value cut-off"
[1] 0.0145803
Methylation
- Filtered by the [34]Methyl modification
Violin plots
Design-based KruskalWallis test
data: mean_hl_hours ~ mod_group
t = 0.90091, df = 1164, p-value = 0.3678
alternative hypothesis: true difference in mean rank score is not equal to 0
sample estimates:
difference in mean rank score
0.04207529
[1] "P-value cut-off"
[1] 0.01464269
Enrichment:
oxPTMs
This is only for proteins that are related to ageing.
All PTMs related to oxidative damage in general, not only oxidation.
Design-based KruskalWallis test
data: mean_hl_hours ~ mod_group
t = -1.5502, df = 2353, p-value = 0.1212
alternative hypothesis: true difference in mean rank score is not equal to 0
sample estimates:
difference in mean rank score
-0.1219495
[1] "P-value cut-off"
[1] 0.01030326
Lysine acylations
Violin plot
Design-based KruskalWallis test
data: mean_hl_hours ~ mod_group
t = -0.53643, df = 1650, p-value = 0.5917
alternative hypothesis: true difference in mean rank score is not equal to 0
sample estimates:
difference in mean rank score
-0.04068778
[1] "P-value cut-off"
[1] 0.0123017
AGEs
Violin plots
Design-based KruskalWallis test
data: mean_hl_hours ~ mod_group
t = -1.5027, df = 1453, p-value = 0.1331
alternative hypothesis: true difference in mean rank score is not equal to 0
sample estimates:
difference in mean rank score
-0.1160658
[1] "P-value cut-off"
[1] 0.01310806
Binning
Hypothesis: The higher the half-life, the greater the number of PTMs.
Phosphorylation
oxPTMs
methylation
Ubiquitination, acetylation, lysine, AGEs
General:
Broken down by the modifications
Check the number of proteins in each bin.
# A tibble: 5 × 2
hl_group protein_count
<chr> <int>
1 0-5 234
2 10-15 256
3 15-20 182
4 20+ 286
5 5-10 226
oxPTMs + phospho
`summarise()` has grouped output by 'hl_group'. You can override using the
`.groups` argument.
# A tibble: 15 × 3
# Groups: hl_group [5]
hl_group mod_group protein_count
<chr> <chr> <int>
1 0-5 - 233
2 0-5 Phosphorylated 165
3 0-5 oxPTMs 227
4 10-15 - 253
5 10-15 Phosphorylated 207
6 10-15 oxPTMs 254
7 15-20 - 181
8 15-20 Phosphorylated 146
9 15-20 oxPTMs 184
10 20+ - 285
11 20+ Phosphorylated 235
12 20+ oxPTMs 285
13 5-10 - 220
14 5-10 Phosphorylated 161
15 5-10 oxPTMs 221
Proteins with a long half-life
Long-lived proteins can be used as estimators of chronological age. Long-lived proteins can be defined in different ways, for example based on the half-life of the protein when compared to the average half-life of proteins in the organism. In this case, long-lived proteins were obtained from the following study: paper. Proteins were classified as long-lived based on their degree of degradation during the experiment and therefore it was possible to discover new long-lived proteins (no a priori assumptions were made).
The study identified a list of long-lived proteins in rats, therefore human orthologs of these proteins were found.
Plot the data distributions
Outliers
Checking that the outliers have been removed.
All of the outliers have been removed.
Check the distribution of the half-lives:
Remove the proteins with very large half-lives:
Now the exact same thing but for `human_complete_hl_long`
PTMs
Phosphorylation
Violin plot:
Design-based KruskalWallis test
data: mean_hl_hours ~ mod_group
t = -0.90735, df = 2491, p-value = 0.3643
alternative hypothesis: true difference in mean rank score is not equal to 0
sample estimates:
difference in mean rank score
-0.03385422
[1] "P-value cut-off"
[1] 0.01001403
Acetylation
Comparison:
Design-based KruskalWallis test
data: mean_hl_hours ~ mod_group
t = 5.7916, df = 2491, p-value = 7.849e-09
alternative hypothesis: true difference in mean rank score is not equal to 0
sample estimates:
difference in mean rank score
0.2002254
[1] "P-value cut-off"
[1] 0.01001403
Ubiquitination
Comparison
Design-based KruskalWallis test
data: mean_hl_hours ~ mod_group
t = 0.36469, df = 2553, p-value = 0.7154
alternative hypothesis: true difference in mean rank score is not equal to 0
sample estimates:
difference in mean rank score
0.01750828
[1] "P-value cut-off"
[1] 0.009891782
Methylation
Design-based KruskalWallis test
data: mean_hl_hours ~ mod_group
t = 3.0915, df = 2491, p-value = 0.002014
alternative hypothesis: true difference in mean rank score is not equal to 0
sample estimates:
difference in mean rank score
0.1590465
[1] "P-value cut-off"
[1] 0.01001403
oxPTMs
Comparing
Design-based KruskalWallis test
data: mean_hl_hours ~ mod_group
t = 5.8874, df = 5089, p-value = 4.175e-09
alternative hypothesis: true difference in mean rank score is not equal to 0
sample estimates:
difference in mean rank score
0.0890046
[1] "P-value cut-off"
[1] 0.007007586
Lysine acylations
Design-based KruskalWallis test
data: mean_hl_hours ~ mod_group
t = 2.5603, df = 4821, p-value = 0.01049
alternative hypothesis: true difference in mean rank score is not equal to 0
sample estimates:
difference in mean rank score
0.09158268
[1] "P-value cut-off"
[1] 0.00719965
AGEs
Violin plots:
Design-based KruskalWallis test
data: mean_hl_hours ~ mod_group
t = -1.5027, df = 1453, p-value = 0.1331
alternative hypothesis: true difference in mean rank score is not equal to 0
sample estimates:
difference in mean rank score
-0.1160658
[1] "P-value cut-off"
[1] 0.01310806
Binning
oxPTMs: